Goto

Collaborating Authors

 lesson plan


Human Experts' Evaluation of Generative AI for Contextualizing STEAM Education in the Global South

Nyaaba, Matthew, Nabang, Macharious, Kyeremeh, Patrick, Nantomah, Ibrahim, Owusu-Fordjour, Collins, Ako, Martin, Akanzire, Bismark Nyaaba, Nantomah, Kassim Korah, Issaka, Cecilia, Zhai, Xiaoming

arXiv.org Artificial Intelligence

STEAM education in many parts of the Global South remains abstract and weakly connected to learners sociocultural realities. This study examines how human experts evaluate the capacity of Generative AI (GenAI) to contextualize STEAM instruction in these settings. Using a convergent mixed-methods design grounded in human-centered and culturally responsive pedagogy, four STEAM education experts reviewed standardized Ghana NaCCA lesson plans and GenAI-generated lessons created with a customized Culturally Responsive Lesson Planner (CRLP). Quantitative data were collected with a validated 25-item Culturally Responsive Pedagogy Rubric assessing bias awareness, cultural representation, contextual relevance, linguistic responsiveness, and teacher agency. Qualitative reflections provided additional insight into the pedagogical and cultural dynamics of each lesson. Findings show that GenAI, especially through the CRLP, improved connections between abstract standards and learners lived experiences. Teacher Agency was the strongest domain, while Cultural Representation was the weakest. CRLP-generated lessons were rated as more culturally grounded and pedagogically engaging. However, GenAI struggled to represent Ghana's cultural diversity, often producing surface-level references, especially in Mathematics and Computing. Experts stressed the need for teacher mediation, community input, and culturally informed refinement of AI outputs. Future work should involve classroom trials, broader expert participation, and fine-tuning with Indigenous corpora.


Boosting Reinforcement Learning in 3D Visuospatial Tasks Through Human-Informed Curriculum Design

Solbach, Markus D., Tsotsos, John K.

arXiv.org Artificial Intelligence

Reinforcement Learning is a mature technology, often suggested as a potential route towards Artificial General Intelligence, with the ambitious goal of replicating the wide range of abilities found in natural and artificial intelligence, including the complexities of human cognition. While RL had shown successes in relatively constrained environments, such as the classic Atari games and specific continuous control problems, recent years have seen efforts to expand its applicability. This work investigates the potential of RL in demonstrating intelligent behaviour and its progress in addressing more complex and less structured problem domains. We present an investigation into the capacity of modern RL frameworks in addressing a seemingly straightforward 3D Same-Different visuospatial task. While initial applications of state-of-the-art methods, including PPO, behavioural cloning and imitation learning, revealed challenges in directly learning optimal strategies, the successful implementation of curriculum learning offers a promising avenue. Effective learning was achieved by strategically designing the lesson plan based on the findings of a real-world human experiment.


An Evaluation of the Pedagogical Soundness and Usability of AI-Generated Lesson Plans Across Different Models and Prompt Frameworks in High-School Physics

Liu, Xincheng

arXiv.org Artificial Intelligence

This study evaluates the pedagogical soundness and usability of AI-generated lesson plans across five leading large language models: ChatGPT (GPT-5), Claude Sonnet 4.5, Gemini 2.5 Flash, DeepSeek V3.2, and Grok 4. Beyond model choice, three structured prompt frameworks were tested: TAG (Task, Audience, Goal), RACE (Role, Audience, Context, Execution), and COSTAR (Context, Objective, Style, Tone, Audience, Response Format). Fifteen lesson plans were generated for a single high-school physics topic, The Electromagnetic Spectrum. The lesson plans were analyzed through four automated computational metrics: (1) readability and linguistic complexity, (2) factual accuracy and hallucination detection, (3) standards and curriculum alignment, and (4) cognitive demand of learning objectives. Results indicate that model selection exerted the strongest influence on linguistic accessibility, with DeepSeek producing the most readable teaching plan (FKGL = 8.64) and Claude generating the densest language (FKGL = 19.89). The prompt framework structure most strongly affected the factual accuracy and pedagogical completeness, with the RACE framework yielding the lowest hallucination index and the highest incidental alignment with NGSS curriculum standards. Across all models, the learning objectives in the fifteen lesson plans clustered at the Remember and Understand tiers of Bloom's taxonomy. There were limited higher-order verbs in the learning objectives extracted. Overall, the findings suggest that readability is significantly governed by model design, while instructional reliability and curricular alignment depend more on the prompt framework. The most effective configuration for lesson plans identified in the results was to combine a readability-optimized model with the RACE framework and an explicit checklist of physics concepts, curriculum standards, and higher-order objectives.


HESEIA: A community-based dataset for evaluating social biases in large language models, co-designed in real school settings in Latin America

Ivetta, Guido, Gomez, Marcos J., Martinelli, Sofía, Palombini, Pietro, Echeveste, M. Emilia, Mazzeo, Nair Carolina, Busaniche, Beatriz, Benotti, Luciana

arXiv.org Artificial Intelligence

Most resources for evaluating social biases in Large Language Models are developed without co-design from the communities affected by these biases, and rarely involve participatory approaches. We introduce HESEIA, a dataset of 46,499 sentences created in a professional development course. The course involved 370 high-school teachers and 5,370 students from 189 Latin-American schools. Unlike existing benchmarks, HESEIA captures intersectional biases across multiple demographic axes and school subjects. It reflects local contexts through the lived experience and pedagogical expertise of educators. Teachers used minimal pairs to create sentences that express stereotypes relevant to their school subjects and communities. We show the dataset diversity in term of demographic axes represented and also in terms of the knowledge areas included. We demonstrate that the dataset contains more stereotypes unrecognized by current LLMs than previous datasets. HESEIA is available to support bias assessments grounded in educational communities.


Chatbots will be able to teach children TWICE as fast as teachers in the next 10 years, says the 'godfather of AI'

Daily Mail - Science & tech

Chatbots will be able to teach children more than twice as fast as teachers can within the next decade, the so-called godfather of AI has predicted. Geoffrey Hinton, who won a Nobel Prize for his work on the technology, also claimed AI personal tutors would'be much more efficient and less boring'. Speaking at Gitex Europe, the British computer scientist said: 'It's not there yet, but it's coming, and so we'll get much better education at many levels.' AI personal tutors are already being trialled in UK schools, with the technology now able to talk directly to the student and adapt lesson plans to their knowledge level. The government has already funnelled millions of pounds into AI education initiatives – though it has claimed the technology will'absolutely not' replace teachers.


LitLinker: Supporting the Ideation of Interdisciplinary Contexts with Large Language Models for Teaching Literature in Elementary Schools

Fan, Haoxiang, Zhou, Changshuang, Yu, Hao, Wu, Xueyang, Gu, Jiangyu, Peng, Zhenhui

arXiv.org Artificial Intelligence

Teaching literature under interdisciplinary contexts (e.g., science, art) that connect reading materials has become popular in elementary schools. However, constructing such contexts is challenging as it requires teachers to explore substantial amounts of interdisciplinary content and link it to the reading materials. In this paper, we develop LitLinker via an iterative design process involving 13 teachers to facilitate the ideation of interdisciplinary contexts for teaching literature. Powered by a large language model (LLM), LitLinker can recommend interdisciplinary topics and contextualize them with the literary elements (e.g., paragraphs, viewpoints) in the reading materials. A within-subjects study (N=16) shows that compared to an LLM chatbot, LitLinker can improve the integration depth of different subjects and reduce workload in this ideation task. Expert interviews (N=9) also demonstrate LitLinker's usefulness for supporting the ideation of interdisciplinary contexts for teaching literature. We conclude with concerns and design considerations for supporting interdisciplinary teaching with LLMs.


Inside the New Nonprofit AI Initiatives Seeking to Aid Teachers and Farmers in Rural Africa

TIME - Tech

Over the past year, rural farmers in Malawi have been seeking advice about their crops and animals from a generative AI chatbot. These farmers ask questions in Chichewa, their native tongue, and the app, Ulangizi, responds in kind, using conversational language based on information taken from the government's agricultural manual. "In the past we could wait for days for agriculture extension workers to come and address whatever problems we had on our farms," Maron Galeta, a Malawian farmer, told Bloomberg. "Just a touch of a button we have all the information we need." The nonprofit behind the app, Opportunity International, hopes to bring similar AI-based solutions to other impoverished communities.


Here's how ed-tech companies are pitching AI to teachers

MIT Technology Review

But this year, more and more educational technology companies are pitching schools on a different use of AI. Rather than scrambling to tamp down the use of it in the classroom, these companies are coaching teachers how to use AI tools to cut down on time they spend on tasks like grading, providing feedback to students, or planning lessons. One company, called Magic School, says its AI tools like quiz generators and text summarizers are used by 2.5 million educators. Khan Academy offers a digital tutor called Khanmigo, which it bills to teachers as "your free, AI-powered teaching assistant." Teachers can use it to assist students in subjects ranging from coding to humanities.


New Curriculum, New Chance -- Retrieval Augmented Generation for Lesson Planning in Ugandan Secondary Schools. Prototype Quality Evaluation

Kloker, Simon, Bukoli, Herbertson, Kateete, Twaha

arXiv.org Artificial Intelligence

Introduction: Poor educational quality in Secondary Schools is still regarded as one of the major struggles in 21st century Uganda - especially in rural areas. Research identifies several problems, including low quality or absent teacher lesson planning. As the government pushes towards the implementation of a new curriculum, exiting lesson plans become obsolete and the problem is worsened. Using a Retrieval Augmented Generation approach, we developed a prototype that generates customized lesson plans based on the government-accredited textbooks. This helps teachers create lesson plans more efficiently and with better quality, ensuring they are fully aligned the new curriculum and the competence-based learning approach. Methods: The prototype was created using Cohere LLM and Sentence Embeddings, and LangChain Framework - and thereafter made available on a public website. Vector stores were trained for three new curriculum textbooks (ICT, Mathematics, History), all at Secondary 1 Level. Twenty-four lessons plans were generated following a pseudo-random generation protocol, based on the suggested periods in the textbooks. The lesson plans were analyzed regarding their technical quality by three independent raters following the Lesson Plan Analysis Protocol (LPAP) by Ndihokubwayo et al. (2022) that is specifically designed for East Africa and competence-based curriculums. Results: Evaluation of 24 lesson plans using the LPAP resulted in an average quality of between 75 and 80%, corresponding to "very good lesson plan". None of the lesson plans scored below 65%, although one lesson plan could be argued to have been missing the topic. In conclusion, the quality of the generated lesson plans is at least comparable, if not better, than those created by humans, as demonstrated in a study in Rwanda, whereby no lesson plan even reached the benchmark of 50%.


To what extent is ChatGPT useful for language teacher lesson plan creation?

Dornburg, Alex, Davin, Kristin

arXiv.org Artificial Intelligence

The advent of generative AI models holds tremendous potential for aiding teachers in the generation of pedagogical materials. However, numerous knowledge gaps concerning the behavior of these models obfuscate the generation of research-informed guidance for their effective usage. Here we assess trends in prompt specificity, variability, and weaknesses in foreign language teacher lesson plans generated by zero-shot prompting in ChatGPT. Iterating a series of prompts that increased in complexity, we found that output lesson plans were generally high quality, though additional context and specificity to a prompt did not guarantee a concomitant increase in quality. Additionally, we observed extreme cases of variability in outputs generated by the same prompt. In many cases, this variability reflected a conflict between 20th century versus 21st century pedagogical practices. These results suggest that the training of generative AI models on classic texts concerning pedagogical practices may represent a currently underexplored topic with the potential to bias generated content towards teaching practices that have been long refuted by research. Collectively, our results offer immediate translational implications for practicing and training foreign language teachers on the use of AI tools. More broadly, these findings reveal the existence of generative AI output trends that have implications for the generation of pedagogical materials across a diversity of content areas.